Optimal switching policies using coarse timesteppers. 
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Abstract 

We present a computer-assisted approach to approximat- 
ing coarse optimal switching policies for systems described 
by microscopic/stochastic evolution rules. The "coarse 
timestepper" constitutes a bridge between the underlying 
kinetic Monte Carlo simulation and traditional, continuum 
numerical optimization techniques formulated in discrete 
time. The approach is illustrated through a simplified kinetic 
Monte Carlo simulation of NO reduction on a Pt catalyst: 
a switch between two coexisting stable steady states is im- 
plemented by minimal manipulation of a system parameter. 



3, 9] and feedback control [14], thus circumventing the 
derivation of explicit macroscopic evolution equations. 

Here we demonstrate the extension of this computational en- 
abling technology to coarse optimization tasks. In particu- 
lar, we computationally approximate coarse optimal switch- 
ing policies for (the expected behavior of) a kinetic Monte 
Carlo simplified model of a catalytic chemical reaction. The 
system is characterized by two "coarse-grained" stable sta- 
tionary states. We seek optimal (for a particular definition 
of the cost function) parameter variation policies that will 
switch the kinetic Monte Carlo simulation from one steady 
state to the other within a finite time interval. 
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1 Introduction 

The search for optimal time-varying operation protocols for 
chemically reacting systems has remained an exciting re- 
search subject for many decades. It has been receiving in- 
creased attention recently, both for lumped and distributed 
in space systems (e.g. [17, 1]); as sensing and actuation 
become increasingly more resolved in space and time, spa- 
tiotemporally complicated operating policies can be consid- 
ered (e.g. [19]). Proposed computational approaches may 
involve (a) solution of the temporally discretized problem 
(both for the process and for the operating variable (s)) si- 
multaneously, using large sparse linear algebra techniques 
(e.g [17]); (b) formulations involving direct integration of 
the model equations in time, keeping track of possible con- 
straint violations [18, 2]; or dynamic programming formu- 
lations. Knowledge of a macroscopic process model, in the 
form of macroscopic mass balances closed through appropri- 
ate constitutive expressions -such as chemical kinetic rate 
formulas-, is a fundamental prerequisite for these computa- 
tional solution strategies. 

' In contemporary engineering modeling, however, we are of- 
ten faced with problems for which the available physical de- 
scription is in the form of atomistic / stochastic evolution 
rules (kinetic Monte Carlo, Lattice Boltzmann, molecular 
dynamics, Brownian dynamics) while the design, optimiza- 
tion or control is required at a coarse-grained, macroscopic 
level. Over the last few years we have been developing a com- 
putational approach enabling microscopic / stochastic sim- 
ulators to directly perform system-level tasks, such as coarse 
integration, stability analysis, bifurcation/continuation[16, 
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2 Process description 

We investigate microscopic/stochastic processes for which we 
believe that the coarse-grained, expected dynamics can be 
adequately approximated by a model of the general type 



F{x,p) 



(1) 



but where the right-hand-side of the evolution law, F, is not 

available in closed form. Here, x(t) E H™ is a state variable 

• ■ ■ i ■ ■ , dx 

vector, t is the time, x is the time derivative of x. —-, and 

' dt 

p G V m is the vector of process parameters {V m C Et m is the 
subset of available values of the process parameters). The 
state variables x(t) are typically a few lower order moments 
of an atomistically evolving distribution (e.g. a concentra- 
tion, or, in our example, a surface coverage, the zeroth mo- 
ment of the distribution of adsorbates on the surface). The 
(unavailable) equation for the expected behavior of the pro- 
cess may possess, at fixed process parameter values, one or 
more steady states x SSy i. 

2.1 The coarse timestepper 

The basis of our approach is the computation of a determin- 
istic optimal policy for the expected dynamics of the process 
(the "coarse-grained dynamics") circumventing the deriva- 
tion of a closed form evolution equation for these dynamics. 
This deterministic policy for the coarse-grained behavior will 
then be applied to individual realizations of the process. As 
we will explain below, it is convenient in our approach to 
reformulate the coarse dynamics in discrete rather than con- 
tinuous time. The coarse evolution law then takes the form: 



G T (xi,p{t)), te(ti,ti + T] 



(2) 



where xi is the state at the beginning of z-th time interval, 
ti, T is the time interval duration (reporting horizon), Gt 
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represents the evolution of Eq.l for a process parameter pro- 
file p(t), initialized at Xi and evolved for time T, arriving at 
state Xi+i at ti + T = tj+i. 

Conventional algorithms for the solution of optimization 
problems involving Gt incorporate frequent calls to a sub- 
routine that evaluates Gt and/or the action of its deriva- 
tives on initial conditions. Equation- free based algorithms 
estimate the same quantities by short bursts of (possibly en- 
sembles of) microscopic simulations conditioned on the same 
macroscopic initial conditions. The coarse time-stepper con- 
stitutes such an estimate of the discrete-time, macroscopic 
input-output map Gt obtained via the kinetic Monte Carlo 
simulator. Through a lifting operator the macroscopic initial 
condition is translated into several consistent microscopic 
initial conditions (distributions conditioned on a few of their 
lower moments). This ensemble of microscopic initial con- 
ditions is then evolved microscopically, in an easily paral- 
lclizablc fashion (one consistent realization per CPU). The 
results are averaged through a restriction operator back to 
a macroscopic "output" ; it is precisely this output that tra- 
ditional algorithms simply compute through function evalu- 
ations when the evolution equations are available in closed 
form. As extensively discussed in [11, 9], part of the micro- 
scopic evolution is spent in a "healing" process - the higher 
moments, which have been initialized "wrong" quickly relax 
to functionals of the low order moments (our state variables). 
A separation of time scales (fast relaxation of the high mo- 
ments to functionals of the low ones, and slow -deterministic- 
evolution of the low ones) underpins the existence of a deter- 
ministic coarse grained evolution law. The dynamics of the 
evolving microscopic distribution moments constitute thus 
a singularly perturbed system. The requirement of finite 
time microscopic evolution (necessary to the moment heal- 
ing process) conforms with the discrete-time formulation of 
the coarse optimization problem, which is common in many 
optimization algorithms (see section 3 below). 

2.2 Numerical experiment 

We illustrate the proposed combination of traditional op- 
timization techniques with stochastic simulators through a 
kinetic Monte Carlo realization (using the stochastic simula- 
tion algorithm, proposed by Gillespie [4, 5]) of a drastically 
simplified kinetic model of NO reduction by H2 on Pt sur- 
faces: 

riff 

^-=a{l-6)- 1 6-k{l-6) 2 6. (3) 

Here 9 describes the surface coverage of adsorbed NO, a, 
7 are the NO adsorption and desorption rate constants re- 
spectively, and k is the reaction rate constant. The reaction 
term is third order due to the need for two free adjacent 
sites for the adsorption of H 2 - In Figure 1 we present the 
deterministic bifurcation diagram in the form of coverage 9 SS 
at steady-state as a function of k for a = 1 and 7 = 0.01. 
We observe a range of k values for which the system exhibits 
multiple steady-states; the higher and lower ones are locally 
stable, while the middle one is unstable. 
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Figure 1: Bifurcation diagram of 9 at steady state with respect 
to k. 

A description of the use of the coarse KMC timestepper in 
obtaining "coarse" versions of this bifurcation diagram may 
be found in [11]. 

3 Coarse computational optimization 

Computing the temporal profiles of the process parameters 
that cause the transition of the coarse-grained system from 
an initial stationary state to a different final one can be for- 
mulated as an optimization problem; the objective is to min- 
imize an integral cost function over time: 

/•OO 

min / Q(t,x,p)dt (4) 
p(t)eP m io 

where Q is a continuous scalar cost function. The constraints 
for this coarse optimization problem are the (unavailable) 
coarse process evolution equations Eq. 1 , the initialization at 
one of the coarse stationary states for p = p ss , the require- 
ment of termination at a different stationary state for the 
same process parameter value p ss , as well as, possibly, other 
inequality constraints g(x,p): 

x-F(x(t),p(t)) = 0, g{x, P )<0 
p(0)=p ss , limp(t)=p ss (5) 

x(0) = x SSi i, lim x(t) = x SSy2 

t—*oo 

This is an infinite dimensional problem in continuous time. 
Direct solution methods are based on the calculus of vari- 
ations. Semi-infinite programming approaches provide us 
with the necessary mathematical tools to solve such prob- 
lems with finite time horizon [6], through discretization of 
the temporal domain. 

Another approach consists of approximating this problem 
through a finite time horizon problem with a final state 
penalty, which is (in our case) subsequently solved in discrete 
time. This results in a finite dimensional, generally nonlin- 
ear, optimization problem which -if the coarse equations are 
available- could be solved using available optimization tech- 
niques. For example, discretizing the process time [0, t/] 
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in N time intervals of length T (not necessarily constant) 
and assuming that the process parameters remain constant 
within each interval p(t) = Pi+i, Vt G (U,ti + T], results in 
the following optimization program with (N + l) xn + N x to 
variables and n x (N + 1) + m equality constraints: 

N -iT 

min / Q d (t,x l ,x l -i,p)dt 

+W(R(\x N -x ss ^\-e)) 
s.t. 

(6) 

g d {x , . . .,x N ,p) < 0, 
Xi = G T {xi-i,p) = 0, i = l,...,N, x = x ss ,i 
N t 1 

P {t) = $>iii(- - i - -) + Pss H(-t). 

Here is analogous to the Q function for the discrete val- 
ues of the state, g d is similarly analogous to g, W(-) is a 
class K scalar function, R(-) is the ramp function and e is 
a value for which \xn — x ss ^ 2 \ < e <=> W = 0, II(-) is 
the standard boxcar function and H(-) denotes the Heavi- 
side function. Appropriate final state penalty contributions 
to the cost function take the place of final state constraints 
at infinite time as stated in the original formulation of the 
problem; the final state is restricted to be (in finite time) 
within a neighborhood of the final stationary state. This 
fully discrete-time formulation is ideally suited for linking 
with a coarse timestepper. The optimization problem formu- 
lated in Eq.6 can further be reduced in size by discretizing 
only the process parameter temporal behavior, resulting in 
the following formulation (coined control vector parameteri- 
zation [18]) with Nxm variables and to equality constraints: 

min / Q d (t,Xi,Xi-i,p)dt 
Piepm ^J { ._ 1)T 

+W(R(\x N -x ss , 2 \-e)) 
s.t. 

g d (x , ■ ■ -,xn,p) < 0, 

(7) 

N 

P (t) = ^>n(| - i - l -)+ Pss H(-t) 

i=l 

where 

Xi = G T (xi-i,p) = 0, i = 1,2, ...,N 
xq — x ss ^\ . 

The difference with the previous formulation lies in that the 
variables Xi , i = 1 , . . . , N are solved for internally, to reduce 
the size of the optimization problem. In cases where the 
explicit form of Eq.l is unavailable, the state evolution is 
provided through direct simulation of the system. 

3.1 Solution methodology 

Traditional discrete time optimization schemes would call, 
during the solution process, a numerical integration subrou- 



tine for the system equations (and possibly variational inte- 
grations for the estimation of derivatives). This call is now 
substituted by the coarse timestepper; the most important 
numerical issue is that of noise, inherent in the lifting process 
and the stochastic simulations, and the variance reduction 
necessary to estimate the state or its various derivatives. 
Simulations of different physical size systems (different lat- 
tice sizes in our simulation) are characterized by different lev- 
els of noise, while the expectation asymptotically approaches 
a limiting value for infinite system size. For the type of sim- 
ulations in this paper, changing the physical size of the sim- 
ulated domain on the one hand, and changing the number 
of copies of the simulation on the other, have comparable 
effects in reducing the variance of the simulation output; 
this, however, is not generally the case in KMC simulations. 
When a switching policy for a particular physical size sys- 
tem is required (e.g. for nanoscopic reacting systems, such 
as the chemical oscillations on Field Emitter tips [15]) vari- 
ance reduction can be affected through a larger ensemble of 
consistent microscopic initializations (see also [12] for a vari- 
ance reduction technique using stochastic calculus). Simula- 
tion noise affects both function evaluations and (numerical) 
coarse derivative evaluations, and thus becomes an impor- 
tant element of the approach. It is interesting, however, to 
observe that massively parallel computation can reduce the 
wall clock time required for the computation distributing dif- 
ferent microscopic initializations to different CPUs. 

Due to the presence of noise in the coarse timestepper re- 
sults, the use of optimization algorithms that are specifically 
designed to be insensitive to noise becomes necessary (it is 
well known that the numerical estimation of derivatives is 
highly susceptible to noise). A class of direct search algo- 
rithms that fulfill this criterion are ones that use only func- 
tion evaluations to search for the optimum such as iterative 
dynamic programming [13], Luus-Jaakola [10], Ncldcr-Mead 
and Hooke-Jeeves algorithms. Algorithms that compute lo- 
cal and bounded approximations of the Jacobian matrix of 
the cost function with respect to the process parameters have 
also been developed, such as the implicit filtering algorithm. 
The reader may refer to [8] for a review of these methods. 

In all the above iterative algorithms, an appropriate initial 
guess of the process policy profile and a set of search direc- 
tions for the to x N variables is required. The usual set of 
search directions (also used in our numerical experiment) is 
the unitary basis for H mxJv . The components of the search 
algorithms also include a vector, the elements of which are 
the maximum distances the algorithm should venture from 
the current position during the new direction search step 
at each iteration. These perturbation distances are called 
scales, and appear in decreasing order. 

Selection of the scales in the search algorithm requires esti- 
mates of the noise magnitude. A scale that is too small not 
only leads to increased computation time with no apparent 
advantages, but, especially in the case of algorithms that 
compute approximations of the Jacobian, can lead to grossly 
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erroneous results for the search. The following timestepper 
protocol yields, in our case, simulation results of known vari- 
ance magnitude: 

1. Initialize timestepper. Set: 

• lattice size Ni, 

• solutions sampling size M r , 

• variance metric d as the square root of the vari- 
ance, subsequently divided by the average value 
of the sample. 

• variance magnitude limit d max and 

• maximum sample size M max . 

2. In each time step, the timestepper: 

(a) simulates system for the desired parameter value 
M r times (this can be affected through different 
microscopic initializations and/or different ran- 
dom seeds in the Monte Carlo process) and com- 
putes d. 

(b) If d > d max and M r < M max , increases M r , re- 
peat step (a) 

(c) If d < d max and M r > M min decreases M r , con- 
tinue with next time step. 

Adaptively adjusting the ensemble of realizations thus helps 
in the choice of scales. Optimization algorithms that are 
based only on function evaluations are not guaranteed to 
converge to a global minimum, or even to a minimum. This 
is due to the fact that we do not compute the necessary opti- 
mally conditions in the neighborhood of the result, and also 
because a search direction may have been neglected. Once 
the optimization algorithm has converged and produced a 
policy profile, it is prudent to restart it with initial guess 
the result of the previous search. This causes a large per- 
turbation, which may lead to a better optimum. Once two 
consecutive runs have produced the same result, the free 
variable profile is declared "optimal over all scales" [8]. 

4 Numerical Results 

The approach outlined in the previous section was applied 
towards the computation of an optimal switching policy (be- 
tween different stationary states) in our simplified NO reduc- 
tion model. Specifically, our objective is to switch from the 
9 SSiS , to the 9 ss j locally stable stationary states (travers- 
ing the unstable stationary state 9 S s,i) with the minimum 
possible effort. The cost function in Eq.7 was (somewhat 
arbitrarily) defined as: 

N 

Q = {k{t) - k ss ) 2 (l - 0.3e-*)r£j(t - iT)) 

»=o 

W = 50[1 - exp(R(\B(t N ) - B ssJ \ - e))] 

where e = 0.05 and R, S denote the ramp and Dirac function 
respectively. We assume that the single "manipulated vari- 
able" for this problem (the process parameter p of Eq.l) is 



Table 1: Process parameters 



Parameter 


Value 


Steady states 

j ^^^^^^ 


T 


0.25 


0„ „ 3301 


N 


20 


9 SSil 0.6803 


tf 


5 


e ssJ 0.9896 




0.45 




a 


1.0 




1 


0.01 





the reaction rate constant k. The process parameters used 
for the specific optimization problem are presented in Table 
1. 

KMC simulations based on the Gillespie algorithm formed 
the basis of the coarse time-stepper that was used to estimate 
the coarse system response. A variety of lattice and sample 
sizes were used in estimating the dynamic behavior of the 
system. Hooke- Jeeves was the algorithm we chose to search 
for the optimal profile. In Table 2 we present the value of 
the cost function for the computed optimal parameter pro- 
files, through which the effect of the error in the computed 
optimal profile is implicitly quantified. As the lattice size iVj 
increases the KMC simulations expected profiles asymptot- 
ically converge to the profile obtained at the limit. For the 
simulations presented the expectation dependence to iVj was 
insignificant. A secondary advantage of the increased lattice 

— 1/2 

size was the variance reduction since d oc N l . This leads, 
at large Ni, to results from the search for the optimal pro- 
file of k that are closer to ones obtained when we use the 
timestepper of the actual deterministic problem (for com- 
parison purposes). The main variance reduction parameter 
in the presented simulations was the sample size M r used to 
compute the coarse response; when M r increases in the KMC 
simulations, the noise magnitude d decreases, as expected, 

— 1/2 

since d oc M r ' . 



Table 2: Hooke- Jeeves search results 



Model 


Lattice 


Runs 


Objective 


t* N 


Legacy 






10.3709 


129 


KMC 


100 x 100 


200 


10.5418 


674 


KMC 


200 x 200 


400 


10.4155 


4673 


KMC 


300 x 300 


600 


10.3973 


16300 


KMC 


400 x 400 


800 


10.3815 


38469 


KMC 


500 x 500 


1000 


10.3811 


75307 



* single CPU pentium IV at 2.4 GHz 



The Gillespie algorithm was chosen so that the coarse be- 
havior is known at the large system size limit, and the 
noisy timestepper optimization results can be compared to 
it. The objective value convergence to the computed opti- 
mal value from the ODE "direct simulator" comes at the 
cost of increased computational work. The use of paral- 
lel computing can, as we discussed, drastically decrease the 
necessary wall clock time. In Figure 2 we present the re- 
sults for Ni = 500 x 500 and M r — 1000 and compare them 
to the direct ODE simulation results. A near-optimal pa- 
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rameter profile is arrived at, due to the combination of MC 
simulations' noise, system size, and the Hooke- Jeeves search 
algorithm (a search direction that has been investigated and 
characterized as unfavorable is not reinvestigated to conserve 
CPU time). 




I Legacy cod9 

MC 500x500 1 000 | 

1.5 1 1 1 1 1 
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v LC = 10.3709 






Legacy code 




MC 500x500 1000 



0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 



Figure 2: Results using Hooke- Jeeves algorithm through numer- 
ical integration of Eq.l (blue line) and using KMC 
simulation (green line), a) Optimal temporal profile 
of process reaction rate k, b) Evolution of NO cov- 
erage 9. 

We also used Implicit Filtering to compute a near optimal 
steady state switching profile for k. The method of implicit 
filtering uses consecutive bounded approximations of the Ja- 
cobian with variable step and limit to the maximum change 
of the design variable at each iteration. The timestepper pro- 
tocol used had d max = 0.005, N max = 16JVj, M max = 4M r , 
Ni = 126 x 126 and M r — 252 and by adaptively adjusting 
M r the noise magnitude criterion was satisfied. The result- 
ing near optimal time profile of k(t) is presented in Figure 
3a, while the near optimal path of NO coverage evolution 
9(t) is shown in Figure 3b for an averaged realization, and 
in Figure 3c for a single KMC realization with lattice size 
N = 100 x 100 and reporting horizon St = 0.0039. 

In Table 3 we present computational results obtained 
through incorporating the coarse timestepper in an implicit 
filtering algorithm. We also observe that a good scales selec- 
tion depends heavily on d max , as scales below a certain limit 
have an adverse effect on the computed near optimal path 
result (compare the results of the search when the lowest 
scale is 2~ 4 to 2~ 3 ). 



Table 3: Implicit Filtering search results 



Model 


A 

Umax 


scales 


Objective 


t [sj 


Legacy 






10.3709 


168 


KMC 


0.005 


2-°,..., 2" 3 


10.5461 


297 


KMC 


0.005 


2-°,...,2- 4 


10.7922 


282 


KMC 


0.0005 


2~ 3 ,...,2- 5 


10.4110 


3684 


KMC 


0.001 


2-°,...,2- 4 


10.4129 


6858 



* single CPU pentium IV at 2.0 GHz 



Our timestepper protocol can be combined with the search 
algorithm to solve successively the optimization program us- 
ing more refined scales. Specifically, an initial search, with 
large values for the scales, can take place at higher d max , 
followed by searches with gradually lower d max and smaller 
scales to refine the search for the optimal path; this approach 
may lead to computational savings. When using the KMC 
with d max = 0.005 and a lower scale of 2 -3 for an initial 
search, followed by a KMC simulation of the system with 
dmax — 0.0005 and lower scale of 2 -5 , we find a near op- 
timal path with v = 10.4110 in a total time of 3981 s. A 
search using KMC with d max — 0.001 and lower scale of 2 -4 
lead to computing a near optimal path of v = 10.4129 in 
6858 s (the results are shown in Table 3). 

The optimal switching path, shown in figure 3b, takes the 
(expected) phase point from 6 SS , S through the unstable sta- 
tionary state 9 ss .i as shown in Figure 3b. After this is accom- 
plished, we observe that the optimal k(t) trajectory rapidly 
converges back to k ss (see Figure 3a). The coarse phase 
point is now within the region of attraction of the steady 
state Ossj, and no particular switching action is needed to 
get us there. 

The cost function values presented in this section were com- 
puted, for reference purposes, by integrating the coarse sys- 
tem Eq.3 for the optimal switching profile found. During the 
optimal search using KMC simulations, the coarse process 
model was never used. Several optimization algorithms were 
explored in conjunction with the coarse timestepper, includ- 
ing Hooke-Jeeves, Nelder-Mead, Implicit-Filtering as well as 
Multilevel Coordinate Search (MCS). Hooke-Jeeves was pri- 
marily chosen due to the simplicity of the method and its 
relative convergence speed. The Implicit Filtering algorithm 
was mainly used with the coarse timestepper noise-reduction 
protocol. Enforcing an upper bound on the noise magnitude, 
along with the selection of the value of the scales provided 
an order of magnitude estimate of the error in the estimated 
derivatives during the Jacobian approximation. 



5 Conclusions 

We presented a computational methodology for the loca- 
tion of coarse near-optimal parameter policies (in particular, 
steady state switching policies) for systems for which macro- 
scopic, coarse evolution equations exist but are not available 
in closed form. The advantage of the proposed method lies 
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Figure 3: Results using Implicit Filtering algorithm through numerical integration of Eq.l (blue line) and using KMC simulation 
(green line), a) Optimal temporal profile of process reaction rate k, b) Evolution of NO coverage 9, c) Evolution of NO 
coverage 9 for a single KMC realization, iV; = 100 x 100, St = 0.0039. 



in the establishment, through the coarse timestepper, of a 
computational bridge between atomistic/ stochastic simula- 
tors and traditional (in particular, derivative free) optimiza- 
tion algorithms. The approach can be directly extended to 
systems with higher dimensional expected behavior (see for 
example [11]), and possibly, through matrix-free methods, to 
systems with infinite dimensional (spatially distributed, but 
dissipative) expected behavior [3]. Our current efforts focus 
on applying this methodology to the study of rare events and 
coarse optimal paths in computational chemistry (e.g. [7]). 
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